Clustering for Control YFP

/home/pjb40/jupytervenv/lib/python3.7/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
scanpy==1.5.1 anndata==0.7.1 umap==0.3.10 numpy==1.16.5 scipy==1.4.1 pandas==1.0.1 scikit-learn==0.23.1 statsmodels==0.10.1 python-igraph==0.7.1 louvain==0.6.1
'/n/scratch3/groups/hsph/hbc/pjb40/scratch/TimeSeries_10X/data/velocyto_analysis/Only_controlN_Tumor'
AnnData object with n_obs × n_vars = 9199 × 1281 
    obs: 'DAY', 'batch', 'sample', 'n_counts', 'log_counts', 'n_genes', 'percent_mito', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'Clusters', '_X', '_Y', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'sample_batch', 'louvain_r0.01', 'louvain_r0.025', 'louvain_r0.05', 'louvain_r0.1', 'louvain_r0.2', 'louvain_r0.3', 'louvain_r0.4', 'louvain_r0.5', 'velocity_self_transition', 'velocity_length', 'velocity_confidence', 'velocity_confidence_transition', 'root_cells', 'end_points', 'velocity_pseudotime'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'velocity_gamma', 'velocity_r2', 'velocity_genes', 'spearmans_score', 'velocity_score'
    uns: 'DAY_colors', 'diffmap_evals', 'draw_graph', 'louvain', 'louvain_r0.01_colors', 'louvain_r0.025_colors', 'louvain_r0.05_colors', 'louvain_r0.1_colors', 'louvain_r0.2_colors', 'louvain_r0.2_sizes', 'louvain_r0.3_colors', 'louvain_r0.4_colors', 'louvain_r0.5_colors', 'neighbors', 'paga', 'pca', 'rank_genes_groups', 'rank_genes_r0.2', 'rank_velocity_genes', 'sample_colors', 'umap', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_draw_graph_fa', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs'
    layers: 'Ms', 'Mu', 'ambiguous', 'counts', 'matrix', 'spliced', 'unspliced', 'variance_velocity', 'velocity'
    obsp: 'connectivities', 'distances'
plotting the louvain clusters with different resolutions

Finding marker genes

AnnData object with n_obs × n_vars = 9199 × 1281 
    obs: 'DAY', 'batch', 'sample', 'n_counts', 'log_counts', 'n_genes', 'percent_mito', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'Clusters', '_X', '_Y', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'sample_batch', 'louvain_r0.01', 'louvain_r0.025', 'louvain_r0.05', 'louvain_r0.1', 'louvain_r0.2', 'louvain_r0.3', 'louvain_r0.4', 'louvain_r0.5', 'velocity_self_transition', 'velocity_length', 'velocity_confidence', 'velocity_confidence_transition', 'root_cells', 'end_points', 'velocity_pseudotime'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'velocity_gamma', 'velocity_r2', 'velocity_genes', 'spearmans_score', 'velocity_score'
    uns: 'DAY_colors', 'diffmap_evals', 'draw_graph', 'louvain', 'louvain_r0.01_colors', 'louvain_r0.025_colors', 'louvain_r0.05_colors', 'louvain_r0.1_colors', 'louvain_r0.2_colors', 'louvain_r0.2_sizes', 'louvain_r0.3_colors', 'louvain_r0.4_colors', 'louvain_r0.5_colors', 'neighbors', 'paga', 'pca', 'rank_genes_groups', 'rank_genes_r0.2', 'rank_velocity_genes', 'sample_colors', 'umap', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_draw_graph_fa', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs'
    layers: 'Ms', 'Mu', 'ambiguous', 'counts', 'matrix', 'spliced', 'unspliced', 'variance_velocity', 'velocity'
    obsp: 'connectivities', 'distances'
Calculate marker genes in all clusters
ranking genes
    finished: added to `.uns['rank_genes_r0.4']`
    'names', sorted np.recarray to be indexed by group ids
    'scores', sorted np.recarray to be indexed by group ids
    'logfoldchanges', sorted np.recarray to be indexed by group ids
    'pvals', sorted np.recarray to be indexed by group ids
    'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:19)
0 1 2 3 4 5 6 7 8 9
0 Scd1 Clu Ager Trf Cbr3 Psca Cks1b Por Ifitm3 Tm4sf1
1 Chil1 S100a6 Emp2 Epas1 Gclc Taldo1 Stmn1 Selenbp1 Cp Cd24a
2 Fasn Krt7 Hopx Npc2 Cd36 Gsn Spc24 Cyp2f2 Cbr2 Tuba1a
3 Sftpc Krt18 Sparc Hp Acot7 Fth1 H2afz Bsg Jun Mlf1
4 Elovl1 Anxa1 Cldn18 Lrg1 Gstm1 Esd Cdc20 Scgb1a1 Gstm2 Hsp90aa1
plotting rank genes in each cluster
Looking at the p-value for all these genes
0_n 0_p 1_n 1_p 2_n 2_p 3_n 3_p 4_n 4_p 5_n 5_p 6_n 6_p 7_n 7_p 8_n 8_p 9_n 9_p
0 Scd1 0.0 Clu 0.0 Ager 0.0 Trf 1.475098e-250 Cbr3 1.269374e-306 Psca 5.911401e-248 Cks1b 1.213229e-204 Por 3.853670e-201 Ifitm3 1.845965e-167 Tm4sf1 5.060793e-42
1 Chil1 0.0 S100a6 0.0 Emp2 0.0 Epas1 8.553258e-214 Gclc 2.850594e-301 Taldo1 2.000148e-241 Stmn1 3.970988e-186 Selenbp1 8.014067e-187 Cp 1.570805e-166 Cd24a 9.965395e-40
2 Fasn 0.0 Krt7 0.0 Hopx 0.0 Npc2 6.086621e-208 Cd36 2.571392e-299 Gsn 3.475468e-233 Spc24 6.368245e-175 Cyp2f2 1.896420e-177 Cbr2 2.057109e-139 Tuba1a 1.499349e-39
3 Sftpc 0.0 Krt18 0.0 Sparc 0.0 Hp 5.398837e-202 Acot7 1.640370e-241 Fth1 4.298752e-229 H2afz 1.876808e-172 Bsg 1.629614e-173 Jun 9.365414e-137 Mlf1 2.545150e-39
4 Elovl1 0.0 Anxa1 0.0 Cldn18 0.0 Lrg1 3.754112e-191 Gstm1 2.843383e-233 Esd 9.977273e-228 Cdc20 1.909069e-159 Scgb1a1 4.607699e-161 Gstm2 1.890413e-134 Hsp90aa1 1.890572e-38
<bound method AnnData.uns_keys of AnnData object with n_obs × n_vars = 9199 × 1281 
    obs: 'DAY', 'batch', 'sample', 'n_counts', 'log_counts', 'n_genes', 'percent_mito', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'Clusters', '_X', '_Y', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'sample_batch', 'louvain_r0.01', 'louvain_r0.025', 'louvain_r0.05', 'louvain_r0.1', 'louvain_r0.2', 'louvain_r0.3', 'louvain_r0.4', 'louvain_r0.5', 'velocity_self_transition', 'velocity_length', 'velocity_confidence', 'velocity_confidence_transition', 'root_cells', 'end_points', 'velocity_pseudotime'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'velocity_gamma', 'velocity_r2', 'velocity_genes', 'spearmans_score', 'velocity_score'
    uns: 'DAY_colors', 'diffmap_evals', 'draw_graph', 'louvain', 'louvain_r0.01_colors', 'louvain_r0.025_colors', 'louvain_r0.05_colors', 'louvain_r0.1_colors', 'louvain_r0.2_colors', 'louvain_r0.2_sizes', 'louvain_r0.3_colors', 'louvain_r0.4_colors', 'louvain_r0.5_colors', 'neighbors', 'paga', 'pca', 'rank_genes_groups', 'rank_genes_r0.2', 'rank_velocity_genes', 'sample_colors', 'umap', 'velocity_graph', 'velocity_graph_neg', 'velocity_params', 'rank_genes_r0.4'
    obsm: 'X_diffmap', 'X_draw_graph_fa', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs'
    layers: 'Ms', 'Mu', 'ambiguous', 'counts', 'matrix', 'spliced', 'unspliced', 'variance_velocity', 'velocity'
    obsp: 'connectivities', 'distances'>
bug to fix: its printing heatmap from keys from old find marker resoultion not new resolution i.e 0.4
WARNING: Groups are not reordered because the `groupby` categories and the `var_group_labels` are different.
categories: 0, 1, 2, etc.
var_group_labels: 2, 3, 4, etc.

Stem cell marker

AT2 markers

AT1 markers

AT1 markers are expressed in right while AT2 cells in the left

Check the expression of AT1 and AT2 markers

May be AT1 markers are started to expresse in cluster 5 then 2 and then 7.

Looking at 'Ager' expression with diff map which is more focused on trajectory findings.

checking the expression of AT1 and AT2

Cluster 0,3,4,6 has high AT2 markers. while is basically all left side of part on UMAP. While in the right side of the UMP, specially in middle starting with cluster 5,2,7 expresses in AT1 markers.

markers for cluster 5
GridSpec(3, 3, height_ratios=[0, 0.8999999999999999, 0], width_ratios=[10, 0, 0.2])
GridSpec(3, 3, height_ratios=[0.8, 4.07, 0.13], width_ratios=[8, 0, 0.2])

Stromal cells

Club cell markers

Club cell markers: Scgb1a1, Scgb3a2

How many cells in clusters

<matplotlib.legend.Legend at 0x2b9ea7e17150>

Cluster 0 most contains DAY 14 cells and some DAY 10.

in my opinon, the most there two While now we see two seperate trends in UMAP for clusters, ie. AT1 and AT2. Combinations of clusters interesting to look at are : cluster 0,7,8. Another trend is : 1,2,3 and a seperate trend : cluster 4,5,6

AT1 and AT2 cell type differences

Looking at all the markers together for AT1 and AT2 instead of individual genes

Looking for cluster 5, as it looks like the starting point for cells to divide into two paths and making two seperate trends in the cluster. Lets find out which cell types they are first.

0 1 2 3 4 5 6 7 8 9
Stem 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.5 0.25 0.0
{'Stem': ['Ly6a', 'Hmga2', 'Sox9', 'Sox2']}

Stem cell markers are expressed highest in cluster 7, then cluster 8 and lastly cluster 1. Lets look which DAY it falls in, expression of genes in this cluster.

Reload the object that computed differential expression by comparing to the rest of the groups.

plotting differentially expressed genes in group 5,2,7 vs 1
***checking which genes are differentiall expressing among AT1 and AT2 cell types***
plotting differentially expressed genes in group 6,4,3 vs 1

Heatmap for marker genes

WARNING: Groups are not reordered because the `groupby` categories and the `var_group_labels` are different.
categories: 0, 1, 2, etc.
var_group_labels: 2, 3, 4, etc.
<matplotlib.axes._subplots.AxesSubplot at 0x2b9ea81a6f50>
markers for cluster 5
0      Psca
1    Taldo1
2       Gsn
3      Fth1
4       Esd
5    Lgals3
6    Cystm1
7      Ftl1
8     Srxn1
9     Ccng1
Name: 5_n, dtype: object
markers for cluster 3
0        Trf
1      Epas1
2       Npc2
3         Hp
4       Lrg1
5       Aox3
6     Atp11a
7    Slc34a2
8    Scgb1a1
9     Atp1b1
Name: 3_n, dtype: object

STOP HERE AND DO NOT RUN BELOW. REFERE TO SUBCLUSTER 2 VELOCITY NOTEBOOK. FOR REPEATING ABOVE RESULTS YOU SHOULD LOAD SAVED H5AD FILE : "controls_after_filter_Clustering_velocity_original_resolution_0.4.h5ad"